Mining search engine query logs via suggestion sampling

نویسندگان

  • Ziv Bar-Yossef
  • Maxim Gurevich
چکیده

Many search engines and other web applications suggest auto-completions as the user types in a query. The suggestions are generated from hidden underlying databases, such as query logs, directories, and lexicons. These databases consist of interesting and useful information, but they are typically not directly accessible. In this paper we describe two algorithms for sampling suggestions using only the public suggestion interface. One of the algorithms samples suggestions uniformly at random and the other samples suggestions proportionally to their popularity. These algorithms can be used to mine the hidden suggestion databases. Example applications include comparison of popularity of given keywords within a search engine’s query log, estimation of the volume of commerciallyoriented queries in a query log, and evaluation of the extent to which a search engine exposes its users to negative content. Our algorithms employ Monte Carlo methods in order to obtain unbiased samples from the suggestion database. Empirical analysis using a publicly available query log demonstrates that our algorithms are efficient and accurate. Results of experiments on two major suggestion services are also provided.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Navigation Behaviors for Term Suggestion of Search Engines

Query expansion is extensively applied in information retrieval systems, such as search engines. Most conventional approaches to query expansion have been developed based on textual analysis of documents. However, different issues such as segmentation and feature selection must be addressed, which might influence performance seriously. This work focuses mainly on avoiding the above problems of ...

متن کامل

External Search Engine Mining

Search engines maintain large amounts of valuable data, such as web content, user queries, clicks, and browsing trails. This data is fully accessible only to the search engines themselves. Other parties, like users, advertisers, and researches, have very limited access to the data via public interfaces provided by search engines (e.g., the search interface). External techniques for mining searc...

متن کامل

Suggestions for Fresh Search Queries by Mining Mircoblog Topics

Query suggestion of Web search has been an effective approach to help users quickly express their information need and more accurately get the information they need. All major web-search engines and most proposed methods that suggest queries rely on query logs of search engine to determine possible query suggestions. However, for search systems, it is much more difficult to effectively suggest ...

متن کامل

Mining Generalized Query Patterns from Web Logs

User logs of a popular search engine keep track of user activities including user queries, user click-through from the returned list, and user browsing behaviors. Knowledge about user queries discovered from user logs can improve the performance of the search engine. We propose a data-mining approach that produces generalized query patterns or templates from the raw user logs of a popular comme...

متن کامل

Enhancing Web Search through Query Log Mining

INTRODUCTION Web query log is a type of file keeping track of the activities of the users who are utilizing a search engine. Compared to traditional information retrieval setting in which documents are the only information source available, query logs are an additional information source in the Web search setting. Based on query logs, a set of Web mining techniques, such as log-based query clus...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PVLDB

دوره 1  شماره 

صفحات  -

تاریخ انتشار 2008